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(54) Method and system for video telephony 

(57) A method and apparatus is described for Intel- 
ligently acquiring participants in a video telephony sys- 
tem by Identifying human faces (1 01 ) in a bit map image 
from other objects and determine locations of the faces 



on the screen (102). The method then determines those 
to be included in the video conferencing by prompting 
for example and addressing a processor to move the 
camera to those locations (104-106). 
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Description 

FIELD OF INVENTION 

[0001] This invention relates to video telephony and 
more particularly to method and apparatus for acquisi- 
tion of participants In a video telephony session. 

BACKGROUND OF INVENTION 

[0002] Video telephony is becoming increasingly pop- 
ular and lower in cost such that if s use is no longer lim- 
ited to use by businesses for conferencing but also use 
between workstations and has promise for home use 
between families sitting in a living room. A video teleph- 
ony system would include a station with a monitor such 
as a television set, a video camera a speaker phone cir- 
cuit and a set top box or CPU for interfacing these ele- 
ments with each other and with a communications net- 
work to pennit the transmission and reception of voice 
and video. A video teiephony communication is de- 
scribed for workstations is described for example in U. 
S. Patent No. 4,893,326 of Duran et al. entitled "Video- 
Telephone Communications System". This reference is 
incorporated herein by reference. The communications 
network may be by cable, telephone network, Intemet, 
wireless, and/or satellite. The present invention relates 
to acquisition of participants In a video conferencing 
session. In other words how to tell the camera on top of 
the television set or monitor whom to focus on. 

SUMMARY OF INVENTION 

[0003] In accordance with an embodiment of the 
present invention an improved method and system for 
acquisition of participants in a video telephony session 
comprises building a list of human participants and op- 
erating the camera move and focus by hopping from hu- 
man to human. 

DESCRIPTION OF DRAWING 

[0004] Preferred and exemplary embodiments will 
now be further described in detail by way of example 
only, and with reference to the figures of the accompa- 
nying drawings in which: 

Fig. 1 is a block diagram of the system according to 
an embodiment of the present invention. 
Fig. 2 is a flow chart of the operation in accordance 
with an embodiment of the present Invention. 
Fig. 3 is a block diagram of a system in accordance 
with other embodiments of the present invention. 

DESCRIPTION OF PREFERRED EMBODIMENTS 

[0005] Referring to Fig. 1 there Is illustrated an em- 
bodiment of the present invention with a pair of stations 



11 and 1 3 connected by a transmission network 1 5 such 
as cable, telephone and Intemet for sending the video 
and voice between stations 11 and 13. Each station 11 
and 13 is in a space 17 and 19 whbh may be a living 
s room. TTie station equipment includes a camera 21 on 
top of a monitor 22 such as a television set. a speaker 
phone circuit 23 (microphone 23a and speaker 23b), a 
remote control 25 and a computer processing unit 
(CPU) such as a set top box 27 for interfacing these el- 
10 ements with each other and with the communications 
network 15. The camera 21 would have a drive motor 
21 a for and moving the camera and/or camera lens to 
focus on objects in the room. The drive motor 21 a would 
move In both horizontal and vertical directions as well 
/5 as in and out to focus on the objects. The camera may 
be controlled by the remote control 25 via the computer 
process unit by a track ball, mouse or clicks by keyboard 
as part of the remote moving the screen up/down and 
left/right. This is not in accordance with a preferred em- 
bodiment of the present Invention. 
[0006] In order to prevent this cumbersome method 
an Improved method and system is provided herein for 
hopping from human to human. The space 1 7 or 1 9 may 
be an enclosed or otherwise defined space such as a 
living room, conference room workstation room or even 
open air space with well defined camera view back- 
ground. The space contains properties which Include 
static objects such as furniture, plants and other static 
and distinct parts of the enclosure such as windows, 
doors, of that space during video conferencing. 
[0007] The camera and processor build a static model 
of the space and static objects in it. This takes place as 
an invisible, background process relative to content be- 
ing displayed on the television or monitor. This is a pro- 
gram in the CPU called for example 
"BUILD_STATIC_MODEL" Another program for dis- 
playing the static model called for example 
"DRAW_STATIC_MODEL"renders the full screen with 
the appliance on it and static object below it. Another 
program in the CPU is a default static object to provide 
a default background. The CPU Includes a program 
called "LOCATE_PERSON(S) that locates the faces of 
person(s) in the space. The program called 
••DEFAULT_STATIC_OBJECr' sets the camera In a de- 
fault position when being powered up. This may be for 
example the closest object along the cameras center- 
line. This can be for the example of the living room the 
center of the sofa in front of the television set or for a 
workstation the nominal chair location. The viewer can 
designate any static object. The objects further include 
the remote controller and persons taking part In the vid- 
eo telephony session and located in the space which 
contains the appliance or station equipment. The object 
may also be a "default person" who Is the person located 
at (for the example sitting on) the default static object. 
The objects are stored in the memory of CPU and called 
upon by the CPU. 

[0008] In accordance with one embodiment of the 
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present Invention the system builds a static model by 
periodically scanning the space and the static objects 
as indicated by step 100. When the camera is powered 
up, the closest object Is usually selected to be an object 
along the camera's centerilne as a start reference point s 
and is also part of step 100. The program in the CPU 
identifies the human faces from the camera's overall bit 
Image as illustrated In step 101 in Fig. 2. The users Infi- 
ages may be on an object file and compared with the bit 
map to Identify whom is on the screen. The system in- io 
eludes the program that Identifies the locations of the 
faces on the screen as Illustrated in step 102. An exam- 
ple of such software Is as in Henry Rowle/s face detec- 
tion thesis described in http:/Www.cs.cmu.edu/afs/cs. 
cmu.edu/user/har/Web/faces.html. i5 
The system then prompts in step 1 04 the user to answer 
if the face is to be included in the video session by a 
message on the display or otherwise the query "Include 
in video session?" and highlighting (step 103) the face 
of the person the question is address. The system can 20 
begin by starting with the person closest to the nominal 
position in the room (orthogonal to the center of the tel- 
evision or monitor screen plane). By clicking a mouse 
or key on the keyboard '^es" or enter holder of the re- 
mote tells the CPU or set top box to include that person 25 
highlighted. The system then goes to the next object 
person and highlights the person at step 103 and que- 
ries again at step 104 If that person is to be Included. 
The highlighting and prompting repeats until all faces 
are detennined if they will be in the video conference. A so 
done or escape key is pressed and the selection is fin- 
ished. This Is represented by step 105. Alternatively, a 
next or arrow key skips the current highlighted person 
and moves to highlight the next one, again with a prompt 
to the next person . The system is driven by the viewer's 3s 
remote clicks on a TV screen - displayed picture and the 
software con^elates the remote's cursor position on the 
screen with location of the faces shown on the screen. 
The camera then adjusts (zoom, pan and tilt) to Include 
only those persons to thereby move from human to 40 
human. . The set of person can be changed or enlarged 
or cut down in size at any time during the videophone 
session. 

[0009] In accordance with another embodiment soft- 
ware in the CPU or set top box identifies persons by 45 
name and not just faces. Each person's face is tagged 
on the screen with the CPU recorded name identified in 
a training session and is thereby identified by name in- 
stead of by just faces. Each person's face is tagged on 
the screen with the CPU recorded name. This is done so 
in atralning session for each family member for example 
after purchase of the equipment. The names are called 
out of the people to be included in the session. 
[0010] In accordance with another embodiment of the 
present Invention, the system provides a private conver- ss 
sation with someone at the other end of the videophone. 
See Figure 3. This may be done in the "Whisper" mode. 
From a screen menu on the local end, at living room 11 



for example, the user A desiring to go Into the "Whisper 
mode" from the nonmal mode selects the "Whisper^ 
mode on the remote 25 and designates a desired target 
person B in the living room 13 as the "Whisper" mode 
target by hopping from face to face as discussed above. 
This Is done while the user A is viewing the other end of 
the link at the living room 13 for example. The face of 
that person A is either highlighted or the others are re- 
moved from the screen or othen/vlse indicated and then 
selected. The person is then selected as the "Whisper" 
mode target. The video camera 21 In room 13 then fo- 
cuses on the target person B. The system performs an 
identification search. The whisper person's identifica- 
tion and contact address phone number may be 
preloaded In memory of box 27 and when the person Is 
highlighted or selected a private telephone line number 
is made available. The videophone may be feature 
equipped with a set-top box 27 having Complete Teleph- 
ony Integration (CTI) capabilities; i.e. the ability to dial 
POTS (Plain Old Telephone Service); and hook up 
videophone mike and speakers Into a private telephone 
line. The system when In the "Whisper" mode and hav- 
ing designated the person automatically calls his or her 
cellphone or private tine 31 and from his or her cellphone 
or private line 33 and diverts (switch) the user's video- 
phone mike and speakers out of shared audio medium 
Into private conversation toward the target's cellphone 
or private line off the set-top box. At any time the user 
desires to end the conversation on the "whisper mode" 
an escape key on the remote 25 is provided to return to 
the normal mode. The escape also happens if the re- 
mote target hangs up on his or her cellphone or private 
line. 

[0011] In accordance with another embodiment a pri- 
vate view at whom I want without notice. This may be 
provided in the voyeurism mode. This may also be se- 
lected by the rehnote 25. As discussed above the capa- 
bilities are used to designate target person by hopping 
from face to face as discussed above such as by high- 
lighting when viewing the other end of the link at the liv- 
ing room 1 3 In the example. The camera 21 at the other 
end (room 13) zooms and focuses on the designated 
target person (B In the example). This zooming can be 
done by "solid state" zooming so the motion of the cam- 
era will not be present to both the target person. Another 
alternative may be is the mechanical servo cam, etc. is 
hidden behind an opaque and static glass screen. If the 
remote end has a small picture-withln-a-picture of local 
user's view the user's camera (camera 21 In room 11 for 
the example) may output a freeze frame of the previous 
(before voyeurism selection) global view of all the others 
at the remote end. An escape from the voyeurism is pro- 
vided by keying the remote 25. 



Claims 

1 . A method of acquisition of participants in a video 
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telephony session connpiising the steps of: 1 3. 



3. 



building a visual enumeration list of humans in 
the video telephony session for the camera to 
focus on; s 
determining locations of the humans; and 
controlling the camera to hop directly from hu- 
man to human. 

The method of Claim 1 wherein said building step io 
includes highlighting a human and prompting users 
to identify If that human Is to be Included. 

The method of Claim 1 wherein each person's face 
is tagged in a training session and the humans to is 
be included are called out or otherwise detenmined 
by the tag. 



The method of any preceding claim including the 
step of a voyeurism mode designating a target per- 
son for viewing without notice. 



14. The method of Claim 13 wherein said camera on 
the other end zooms on the target person for view- 
ing. 

15. The method of Claim 13 wherein the target person's 
view of user's view only has a freeze frame view of 
user's view before going into the voyeurism mode. 

16. The method of Claim 13 including the step of es- 
caping from the voyeurism mode using a remote. 



The method of any preceding claim wherein the lo- 
cations of the human faces are detemnined and 
stored. 
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The method of any preceding claim wherein the 
building step includes comparing a stored bit map 
of the faces of participants with a received bit map 
from the camera and the locating step determines 
the locations of the faces In the image. 
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6. The method of any preceding claim wherein the 

camera includes a drive circuit responsive to the 
stored locations for driving the camera to focus on 
the faces. 
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7. The method of any preceding claim including the 
step of designating a target person in a whisper tar- 
get mode, and diverting videophone mil<e and 
speakers out of shared audio to private conversa- 
tion. 
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The method of Claim 7 including the step of auto- 
matically calling designated person's private phone 
when designating a target person as the whisper 
target. 
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9. The method of Claim 8 wherein the designated tar- 
get person's cellphone is called. 
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10. The method of Claim 7 wherein said designating 
step includes highlighting the target person on the 
video screen. so 



11. The method of Claim 7 wherein said designating 
step Includes removing all other humans on the 
screen but the target person. 

12. The method of Claim 7 including the step of escap- 
ing from the whisper mode using a remote. 
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